Investigating Recurrence and Eligibility Traces in Deep Q-Networks
نویسندگان
چکیده
Eligibility traces in reinforcement learning are used as a bias-variance trade-off and can often speed up training time by propagating knowledge back over time-steps in a single update. We investigate the use of eligibility traces in combination with recurrent networks in the Atari domain. We illustrate the benefits of both recurrent nets and eligibility traces in some Atari games, and highlight also the importance of the optimization used in the training.
منابع مشابه
Using Sliding Mode Controller and Eligibility Traces for Controlling the Blood Glucose in Diabetic Patients at the Presence of Fault
Some people suffering from diabetes use insulin injection pumps to control the blood glucose level. Sometimes, the fault may occur in the sensor or actuator of these pumps. The main objective of this paper is controlling the blood glucose level at the desired level and fault-tolerant control of these injection pumps. To this end, the eligibility traces algorithm is combined with the sliding mod...
متن کاملBidding Strategy on Demand Side Using Eligibility Traces Algorithm
Restructuring in the power industry is followed by splitting different parts and creating a competition between purchasing and selling sections. As a consequence, through an active participation in the energy market, the service provider companies and large consumers create a context for overcoming the problems resulted from lack of demand side participation in the market. The most prominent ch...
متن کاملOpposition-Based Q(λ) with Non-Markovian Update
The OQ(λ) algorithm benefits from an extension of eligibility traces introduced as opposition trace. This new technique is a combination of the idea of opposition and eligibility traces to deal with large state space problems in reinforcement learning applications. In our previous works the comparison of the results of OQ(λ) and conventional Watkins’ Q(λ) reflected a remarkable increase in perf...
متن کاملA Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Recently, a new multi-step temporal learning algorithm, called Q(σ), unifies n-step Tree-Backup (when σ = 0) and n-step Sarsa (when σ = 1) by introducing a sampling parameter σ. However, similar to other multi-step temporal-difference learning algorithms, Q(σ) needs much memory consumption and computation time. Eligibility trace is an important mechanism to transform the off-line updates into e...
متن کاملFrom Q( ) to Average Q-learning: Efficient Implementation of an Asymptotic Approximation
Q( ) is a reinforcement learning algorithm that combines Q-learning and TD( ). Online implementations of Q( ) that use eligibility traces have been shown to speed basic Q-learning. In this paper we present an asymptotic analysis of Watkins’ Q( ) with accumulative eligibility traces. We first introduce an asymptotic approximation of Q( ) that appears to be a gain matrix variant of basic Qlearnin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1704.05495 شماره
صفحات -
تاریخ انتشار 2017